Overview

Dataset statistics

Number of variables25
Number of observations30000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory10.7 MiB
Average record size in memory374.2 B

Variable types

NUM21
CAT4

Reproduction

Analysis started2020-05-08 20:18:04.017895
Analysis finished2020-05-08 20:20:44.432077
Versionpandas-profiling v2.6.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
BILL_AMT2 is highly correlated with BILL_AMT1 and 1 other fieldsHigh Correlation
BILL_AMT1 is highly correlated with BILL_AMT2High Correlation
BILL_AMT3 is highly correlated with BILL_AMT2 and 1 other fieldsHigh Correlation
BILL_AMT4 is highly correlated with BILL_AMT3 and 2 other fieldsHigh Correlation
BILL_AMT5 is highly correlated with BILL_AMT4 and 1 other fieldsHigh Correlation
BILL_AMT6 is highly correlated with BILL_AMT4 and 1 other fieldsHigh Correlation
PAY_AMT2 is highly skewed (γ1 = 30.45381745) Skewed
PAY_0 has 14737 (49.1%) zeros Zeros
PAY_2 has 15730 (52.4%) zeros Zeros
PAY_3 has 15764 (52.5%) zeros Zeros
PAY_4 has 16455 (54.9%) zeros Zeros
PAY_5 has 16947 (56.5%) zeros Zeros
PAY_6 has 16286 (54.3%) zeros Zeros
BILL_AMT1 has 2008 (6.7%) zeros Zeros
BILL_AMT2 has 2506 (8.4%) zeros Zeros
BILL_AMT3 has 2870 (9.6%) zeros Zeros
BILL_AMT4 has 3195 (10.7%) zeros Zeros
BILL_AMT5 has 3506 (11.7%) zeros Zeros
BILL_AMT6 has 4020 (13.4%) zeros Zeros
PAY_AMT1 has 5249 (17.5%) zeros Zeros
PAY_AMT2 has 5396 (18.0%) zeros Zeros
PAY_AMT3 has 5968 (19.9%) zeros Zeros
PAY_AMT4 has 6408 (21.4%) zeros Zeros
PAY_AMT5 has 6703 (22.3%) zeros Zeros
PAY_AMT6 has 7173 (23.9%) zeros Zeros

Variables

ID
Real number (ℝ≥0)

UNIFORM
UNIQUE
Distinct count30000
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15000.5
Minimum1
Maximum30000
Zeros0
Zeros (%)0.0%
Memory size234.5 KiB

Quantile statistics

Minimum1
5-th percentile1500.95
Q17500.75
median15000.5
Q322500.25
95-th percentile28500.05
Maximum30000
Range29999
Interquartile range (IQR)14999.5

Descriptive statistics

Standard deviation8660.398374
Coefficient of variation (CV)0.5773406469
Kurtosis-1.2
Mean15000.5
Median Absolute Deviation (MAD)7500
Skewness0
Sum450015000
Variance75002500
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[1.e+00 3.e+04], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
2047 1 < 0.1%
 
1322 1 < 0.1%
 
15629 1 < 0.1%
 
9486 1 < 0.1%
 
11535 1 < 0.1%
 
21792 1 < 0.1%
 
23841 1 < 0.1%
 
17698 1 < 0.1%
 
19747 1 < 0.1%
 
29988 1 < 0.1%
 
Other values (29990) 29990 > 99.9%
 
ValueCountFrequency (%) 
1 1 < 0.1%
 
2 1 < 0.1%
 
3 1 < 0.1%
 
4 1 < 0.1%
 
5 1 < 0.1%
 
ValueCountFrequency (%) 
30000 1 < 0.1%
 
29999 1 < 0.1%
 
29998 1 < 0.1%
 
29997 1 < 0.1%
 
29996 1 < 0.1%
 

LIMIT_BAL
Real number (ℝ≥0)

Distinct count81
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean167484.3227
Minimum10000
Maximum1000000
Zeros0
Zeros (%)0.0%
Memory size234.5 KiB

Quantile statistics

Minimum10000
5-th percentile20000
Q150000
median140000
Q3240000
95-th percentile430000
Maximum1000000
Range990000
Interquartile range (IQR)190000

Descriptive statistics

Standard deviation129747.6616
Coefficient of variation (CV)0.7746854124
Kurtosis0.5362628964
Mean167484.3227
Median Absolute Deviation (MAD)104957.0008
Skewness0.9928669605
Sum5024529680
Variance1.683445568e+10
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 10000. 13000. 18000. 25000. 35000. ... 505000. 525000. 645000. 755000. 1000000.], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
50000 3365 11.2%
 
20000 1976 6.6%
 
30000 1610 5.4%
 
80000 1567 5.2%
 
200000 1528 5.1%
 
150000 1110 3.7%
 
100000 1048 3.5%
 
180000 995 3.3%
 
360000 881 2.9%
 
60000 825 2.8%
 
Other values (71) 15095 50.3%
 
ValueCountFrequency (%) 
10000 493 1.6%
 
16000 2 < 0.1%
 
20000 1976 6.6%
 
30000 1610 5.4%
 
40000 230 0.8%
 
ValueCountFrequency (%) 
1000000 1 < 0.1%
 
800000 2 < 0.1%
 
780000 2 < 0.1%
 
760000 1 < 0.1%
 
750000 4 < 0.1%
 

SEX
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size234.5 KiB
female
18112
male
11888
ValueCountFrequency (%) 
female 18112 60.4%
 
male 11888 39.6%
 

Length

Max length6
Mean length5.207466667
Min length4
ValueCountFrequency (%) 
Lowercase_Letter 5 100.0%
 
ValueCountFrequency (%) 
Latin 5 100.0%
 
ValueCountFrequency (%) 
ASCII 5 100.0%
 

EDUCATION
Categorical

Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size234.5 KiB
university
14030
graduate school
10585
high school
4917
other
 
468
ValueCountFrequency (%) 
university 14030 46.8%
 
graduate school 10585 35.3%
 
high school 4917 16.4%
 
other 468 1.6%
 

Length

Max length15
Mean length11.85006667
Min length5
ValueCountFrequency (%) 
Lowercase_Letter 16 94.1%
 
Space_Separator 1 5.9%
 
ValueCountFrequency (%) 
Latin 16 94.1%
 
Common 1 5.9%
 
ValueCountFrequency (%) 
ASCII 17 100.0%
 

MARRIAGE
Categorical

Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size234.5 KiB
2
15964
1
13659
3
 
323
0
 
54
ValueCountFrequency (%) 
2 15964 53.2%
 
1 13659 45.5%
 
3 323 1.1%
 
0 54 0.2%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 4 100.0%
 
ValueCountFrequency (%) 
Common 4 100.0%
 
ValueCountFrequency (%) 
ASCII 4 100.0%
 

AGE
Real number (ℝ≥0)

Distinct count56
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.4855
Minimum21
Maximum79
Zeros0
Zeros (%)0.0%
Memory size234.5 KiB

Quantile statistics

Minimum21
5-th percentile23
Q128
median34
Q341
95-th percentile53
Maximum79
Range58
Interquartile range (IQR)13

Descriptive statistics

Standard deviation9.217904068
Coefficient of variation (CV)0.2597653709
Kurtosis0.04430337824
Mean35.4855
Median Absolute Deviation (MAD)7.546117967
Skewness0.7322458688
Sum1064565
Variance84.96975541
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[21. 21.5 22.5 23.5 26.5 ... 58.5 61.5 66.5 70.5 79. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
29 1605 5.3%
 
27 1477 4.9%
 
28 1409 4.7%
 
30 1395 4.7%
 
26 1256 4.2%
 
31 1217 4.1%
 
25 1186 4.0%
 
34 1162 3.9%
 
32 1158 3.9%
 
33 1146 3.8%
 
Other values (46) 16989 56.6%
 
ValueCountFrequency (%) 
21 67 0.2%
 
22 560 1.9%
 
23 931 3.1%
 
24 1127 3.8%
 
25 1186 4.0%
 
ValueCountFrequency (%) 
79 1 < 0.1%
 
75 3 < 0.1%
 
74 1 < 0.1%
 
73 4 < 0.1%
 
72 3 < 0.1%
 

PAY_0
Real number (ℝ)

ZEROS
Distinct count11
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.0167
Minimum-2
Maximum8
Zeros14737
Zeros (%)49.1%
Memory size234.5 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.123801528
Coefficient of variation (CV)-67.29350467
Kurtosis2.720715042
Mean-0.0167
Median Absolute Deviation (MAD)0.7375312333
Skewness0.7319749269
Sum-501
Variance1.262929874
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-2. -0.5 0.5 1.5 2.5 3.5 4.5 5.5 7.5 8. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 14737 49.1%
 
-1 5686 19.0%
 
1 3688 12.3%
 
-2 2759 9.2%
 
2 2667 8.9%
 
3 322 1.1%
 
4 76 0.3%
 
5 26 0.1%
 
8 19 0.1%
 
6 11 < 0.1%
 
ValueCountFrequency (%) 
-2 2759 9.2%
 
-1 5686 19.0%
 
0 14737 49.1%
 
1 3688 12.3%
 
2 2667 8.9%
 
ValueCountFrequency (%) 
8 19 0.1%
 
7 9 < 0.1%
 
6 11 < 0.1%
 
5 26 0.1%
 
4 76 0.3%
 

PAY_2
Real number (ℝ)

ZEROS
Distinct count11
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.1337666667
Minimum-2
Maximum8
Zeros15730
Zeros (%)52.4%
Memory size234.5 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.197185973
Coefficient of variation (CV)-8.949807922
Kurtosis1.57041773
Mean-0.1337666667
Median Absolute Deviation (MAD)0.8199204089
Skewness0.7905650222
Sum-4013
Variance1.433254254
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-2. -1.5 -0.5 0.5 1.5 2.5 3.5 4.5 7.5 8. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 15730 52.4%
 
-1 6050 20.2%
 
2 3927 13.1%
 
-2 3782 12.6%
 
3 326 1.1%
 
4 99 0.3%
 
1 28 0.1%
 
5 25 0.1%
 
7 20 0.1%
 
6 12 < 0.1%
 
ValueCountFrequency (%) 
-2 3782 12.6%
 
-1 6050 20.2%
 
0 15730 52.4%
 
1 28 0.1%
 
2 3927 13.1%
 
ValueCountFrequency (%) 
8 1 < 0.1%
 
7 20 0.1%
 
6 12 < 0.1%
 
5 25 0.1%
 
4 99 0.3%
 

PAY_3
Real number (ℝ)

ZEROS
Distinct count11
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.1662
Minimum-2
Maximum8
Zeros15764
Zeros (%)52.5%
Memory size234.5 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.196867568
Coefficient of variation (CV)-7.201369245
Kurtosis2.084435875
Mean-0.1662
Median Absolute Deviation (MAD)0.8294784933
Skewness0.8406818269
Sum-4986
Variance1.432491976
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-2. -1.5 -0.5 0.5 1.5 2.5 3.5 4.5 7.5 8. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 15764 52.5%
 
-1 5938 19.8%
 
-2 4085 13.6%
 
2 3819 12.7%
 
3 240 0.8%
 
4 76 0.3%
 
7 27 0.1%
 
6 23 0.1%
 
5 21 0.1%
 
1 4 < 0.1%
 
ValueCountFrequency (%) 
-2 4085 13.6%
 
-1 5938 19.8%
 
0 15764 52.5%
 
1 4 < 0.1%
 
2 3819 12.7%
 
ValueCountFrequency (%) 
8 3 < 0.1%
 
7 27 0.1%
 
6 23 0.1%
 
5 21 0.1%
 
4 76 0.3%
 

PAY_4
Real number (ℝ)

ZEROS
Distinct count11
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.2206666667
Minimum-2
Maximum8
Zeros16455
Zeros (%)54.9%
Memory size234.5 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.169138622
Coefficient of variation (CV)-5.29821128
Kurtosis3.496983496
Mean-0.2206666667
Median Absolute Deviation (MAD)0.8112406667
Skewness0.9996294133
Sum-6620
Variance1.366885118
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-2. -0.5 0.5 1.5 2.5 ... 4.5 5.5 6.5 7.5 8. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 16455 54.9%
 
-1 5687 19.0%
 
-2 4348 14.5%
 
2 3159 10.5%
 
3 180 0.6%
 
4 69 0.2%
 
7 58 0.2%
 
5 35 0.1%
 
6 5 < 0.1%
 
8 2 < 0.1%
 
ValueCountFrequency (%) 
-2 4348 14.5%
 
-1 5687 19.0%
 
0 16455 54.9%
 
1 2 < 0.1%
 
2 3159 10.5%
 
ValueCountFrequency (%) 
8 2 < 0.1%
 
7 58 0.2%
 
6 5 < 0.1%
 
5 35 0.1%
 
4 69 0.2%
 

PAY_5
Real number (ℝ)

ZEROS
Distinct count10
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.2662
Minimum-2
Maximum8
Zeros16947
Zeros (%)56.5%
Memory size234.5 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.133187406
Coefficient of variation (CV)-4.256902352
Kurtosis3.989748144
Mean-0.2662
Median Absolute Deviation (MAD)0.7964248667
Skewness1.008197025
Sum-7986
Variance1.284113697
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-2. -0.5 1. 2.5 3.5 4.5 5.5 6.5 7.5 8. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 16947 56.5%
 
-1 5539 18.5%
 
-2 4546 15.2%
 
2 2626 8.8%
 
3 178 0.6%
 
4 84 0.3%
 
7 58 0.2%
 
5 17 0.1%
 
6 4 < 0.1%
 
8 1 < 0.1%
 
ValueCountFrequency (%) 
-2 4546 15.2%
 
-1 5539 18.5%
 
0 16947 56.5%
 
2 2626 8.8%
 
3 178 0.6%
 
ValueCountFrequency (%) 
8 1 < 0.1%
 
7 58 0.2%
 
6 4 < 0.1%
 
5 17 0.1%
 
4 84 0.3%
 

PAY_6
Real number (ℝ)

ZEROS
Distinct count10
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.2911
Minimum-2
Maximum8
Zeros16286
Zeros (%)54.3%
Memory size234.5 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.149987626
Coefficient of variation (CV)-3.950489954
Kurtosis3.42653413
Mean-0.2911
Median Absolute Deviation (MAD)0.8289434333
Skewness0.9480293916
Sum-8733
Variance1.322471539
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-2. -1.5 -0.5 1. 2.5 3.5 4.5 6.5 7.5 8. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 16286 54.3%
 
-1 5740 19.1%
 
-2 4895 16.3%
 
2 2766 9.2%
 
3 184 0.6%
 
4 49 0.2%
 
7 46 0.2%
 
6 19 0.1%
 
5 13 < 0.1%
 
8 2 < 0.1%
 
ValueCountFrequency (%) 
-2 4895 16.3%
 
-1 5740 19.1%
 
0 16286 54.3%
 
2 2766 9.2%
 
3 184 0.6%
 
ValueCountFrequency (%) 
8 2 < 0.1%
 
7 46 0.2%
 
6 19 0.1%
 
5 13 < 0.1%
 
4 49 0.2%
 

BILL_AMT1
Real number (ℝ)

HIGH CORRELATION
ZEROS
Distinct count22723
Unique (%)75.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean51223.3309
Minimum-165580
Maximum964511
Zeros2008
Zeros (%)6.7%
Memory size234.5 KiB

Quantile statistics

Minimum-165580
5-th percentile0
Q13558.75
median22381.5
Q367091
95-th percentile201203.05
Maximum964511
Range1130091
Interquartile range (IQR)63532.25

Descriptive statistics

Standard deviation73635.86058
Coefficient of variation (CV)1.437545339
Kurtosis9.806289341
Mean51223.3309
Median Absolute Deviation (MAD)50502.00599
Skewness2.663861022
Sum1536699927
Variance5422239963
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-165580. -14847. -6352.5 -2223. -1032.5 ... 311489. 390634.5 509866. 641760. 964511. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 2008 6.7%
 
390 244 0.8%
 
780 76 0.3%
 
326 72 0.2%
 
316 63 0.2%
 
2500 59 0.2%
 
396 49 0.2%
 
2400 39 0.1%
 
416 29 0.1%
 
500 25 0.1%
 
Other values (22713) 27336 91.1%
 
ValueCountFrequency (%) 
-165580 1 < 0.1%
 
-154973 1 < 0.1%
 
-15308 1 < 0.1%
 
-14386 1 < 0.1%
 
-11545 1 < 0.1%
 
ValueCountFrequency (%) 
964511 1 < 0.1%
 
746814 1 < 0.1%
 
653062 1 < 0.1%
 
630458 1 < 0.1%
 
626648 1 < 0.1%
 

BILL_AMT2
Real number (ℝ)

HIGH CORRELATION
ZEROS
Distinct count22346
Unique (%)74.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49179.07517
Minimum-69777
Maximum983931
Zeros2506
Zeros (%)8.4%
Memory size234.5 KiB

Quantile statistics

Minimum-69777
5-th percentile0
Q12984.75
median21200
Q364006.25
95-th percentile194792.2
Maximum983931
Range1053708
Interquartile range (IQR)61021.5

Descriptive statistics

Standard deviation71173.76878
Coefficient of variation (CV)1.447236829
Kurtosis10.30294592
Mean49179.07517
Median Absolute Deviation (MAD)48673.54453
Skewness2.705220853
Sum1475372255
Variance5065705363
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-6.97770e+04 -9.48450e+03 -2.97650e+03 -1.04700e+03 -4.22500e+02 ... 3.24529e+05 4.00555e+05 5.12588e+05 6.01868e+05 9.83931e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 2506 8.4%
 
390 231 0.8%
 
326 75 0.2%
 
780 75 0.2%
 
316 72 0.2%
 
2500 51 0.2%
 
396 51 0.2%
 
2400 42 0.1%
 
-200 29 0.1%
 
416 28 0.1%
 
Other values (22336) 26840 89.5%
 
ValueCountFrequency (%) 
-69777 1 < 0.1%
 
-67526 1 < 0.1%
 
-33350 1 < 0.1%
 
-30000 1 < 0.1%
 
-26214 1 < 0.1%
 
ValueCountFrequency (%) 
983931 1 < 0.1%
 
743970 1 < 0.1%
 
671563 1 < 0.1%
 
646770 1 < 0.1%
 
624475 1 < 0.1%
 

BILL_AMT3
Real number (ℝ)

HIGH CORRELATION
ZEROS
Distinct count22026
Unique (%)73.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean47013.1548
Minimum-157264
Maximum1664089
Zeros2870
Zeros (%)9.6%
Memory size234.5 KiB

Quantile statistics

Minimum-157264
5-th percentile0
Q12666.25
median20088.5
Q360164.75
95-th percentile187821.05
Maximum1664089
Range1821353
Interquartile range (IQR)57498.5

Descriptive statistics

Standard deviation69349.38743
Coefficient of variation (CV)1.475106015
Kurtosis19.78325514
Mean47013.1548
Median Absolute Deviation (MAD)46873.96302
Skewness3.087830046
Sum1410394644
Variance4809337537
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-1.572640e+05 -1.680800e+04 -5.286500e+03 -2.802000e+03 -1.065000e+03 ... 3.080855e+05 3.956640e+05 4.993875e+05 5.881930e+05 1.664089e+06], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 2870 9.6%
 
390 275 0.9%
 
780 74 0.2%
 
326 63 0.2%
 
316 62 0.2%
 
396 48 0.2%
 
2500 40 0.1%
 
2400 39 0.1%
 
416 29 0.1%
 
200 27 0.1%
 
Other values (22016) 26473 88.2%
 
ValueCountFrequency (%) 
-157264 1 < 0.1%
 
-61506 1 < 0.1%
 
-46127 1 < 0.1%
 
-34041 1 < 0.1%
 
-25443 1 < 0.1%
 
ValueCountFrequency (%) 
1664089 1 < 0.1%
 
855086 1 < 0.1%
 
693131 1 < 0.1%
 
689643 1 < 0.1%
 
689627 1 < 0.1%
 

BILL_AMT4
Real number (ℝ)

HIGH CORRELATION
ZEROS
Distinct count21548
Unique (%)71.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43262.94897
Minimum-170000
Maximum891586
Zeros3195
Zeros (%)10.7%
Memory size234.5 KiB

Quantile statistics

Minimum-170000
5-th percentile0
Q12326.75
median19052
Q354506
95-th percentile174333.35
Maximum891586
Range1061586
Interquartile range (IQR)52179.25

Descriptive statistics

Standard deviation64332.85613
Coefficient of variation (CV)1.487019671
Kurtosis11.30932483
Mean43262.94897
Median Absolute Deviation (MAD)43639.00712
Skewness2.821965291
Sum1297888469
Variance4138716378
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-170000. -25896.5 -6121.5 -2976.5 -1570. ... 320757.5 390539. 489393. 570919.5 891586. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 3195 10.7%
 
390 246 0.8%
 
780 101 0.3%
 
316 68 0.2%
 
326 62 0.2%
 
396 44 0.1%
 
150 39 0.1%
 
2400 39 0.1%
 
2500 34 0.1%
 
1000 33 0.1%
 
Other values (21538) 26139 87.1%
 
ValueCountFrequency (%) 
-170000 1 < 0.1%
 
-81334 1 < 0.1%
 
-65167 1 < 0.1%
 
-50616 1 < 0.1%
 
-46627 1 < 0.1%
 
ValueCountFrequency (%) 
891586 1 < 0.1%
 
706864 1 < 0.1%
 
628699 1 < 0.1%
 
616836 1 < 0.1%
 
572805 1 < 0.1%
 

BILL_AMT5
Real number (ℝ)

HIGH CORRELATION
ZEROS
Distinct count21010
Unique (%)70.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40311.40097
Minimum-81334
Maximum927171
Zeros3506
Zeros (%)11.7%
Memory size234.5 KiB

Quantile statistics

Minimum-81334
5-th percentile0
Q11763
median18104.5
Q350190.5
95-th percentile165794.3
Maximum927171
Range1008505
Interquartile range (IQR)48427.5

Descriptive statistics

Standard deviation60797.15577
Coefficient of variation (CV)1.508187617
Kurtosis12.30588129
Mean40311.40097
Median Absolute Deviation (MAD)41211.06439
Skewness2.876379867
Sum1209342029
Variance3696294150
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-81334. -10657.5 -5042. -1981.5 -1003. ... 265940. 311764. 370768. 520227. 927171. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 3506 11.7%
 
390 235 0.8%
 
780 94 0.3%
 
316 79 0.3%
 
326 62 0.2%
 
150 58 0.2%
 
396 47 0.2%
 
2400 39 0.1%
 
2500 37 0.1%
 
416 36 0.1%
 
Other values (21000) 25807 86.0%
 
ValueCountFrequency (%) 
-81334 1 < 0.1%
 
-61372 1 < 0.1%
 
-53007 1 < 0.1%
 
-46627 1 < 0.1%
 
-37594 1 < 0.1%
 
ValueCountFrequency (%) 
927171 1 < 0.1%
 
823540 1 < 0.1%
 
587067 1 < 0.1%
 
551702 1 < 0.1%
 
547880 1 < 0.1%
 

BILL_AMT6
Real number (ℝ)

HIGH CORRELATION
ZEROS
Distinct count20604
Unique (%)68.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38871.7604
Minimum-339603
Maximum961664
Zeros4020
Zeros (%)13.4%
Memory size234.5 KiB

Quantile statistics

Minimum-339603
5-th percentile0
Q11256
median17071
Q349198.25
95-th percentile161912
Maximum961664
Range1301267
Interquartile range (IQR)47942.25

Descriptive statistics

Standard deviation59554.10754
Coefficient of variation (CV)1.53206613
Kurtosis12.27070529
Mean38871.7604
Median Absolute Deviation (MAD)40381.46803
Skewness2.846644576
Sum1166152812
Variance3546691724
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-339603. -54251.5 -24295. -6106. -3020. ... 311963.5 365432.5 439967.5 527638.5 961664. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 4020 13.4%
 
390 207 0.7%
 
780 86 0.3%
 
150 78 0.3%
 
316 77 0.3%
 
326 56 0.2%
 
396 45 0.1%
 
416 36 0.1%
 
-18 33 0.1%
 
2400 32 0.1%
 
Other values (20594) 25330 84.4%
 
ValueCountFrequency (%) 
-339603 1 < 0.1%
 
-209051 1 < 0.1%
 
-150953 1 < 0.1%
 
-94625 1 < 0.1%
 
-73895 1 < 0.1%
 
ValueCountFrequency (%) 
961664 1 < 0.1%
 
699944 1 < 0.1%
 
568638 1 < 0.1%
 
527711 1 < 0.1%
 
527566 1 < 0.1%
 

PAY_AMT1
Real number (ℝ≥0)

ZEROS
Distinct count7943
Unique (%)26.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5663.5805
Minimum0
Maximum873552
Zeros5249
Zeros (%)17.5%
Memory size234.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11000
median2100
Q35006
95-th percentile18428.2
Maximum873552
Range873552
Interquartile range (IQR)4006

Descriptive statistics

Standard deviation16563.28035
Coefficient of variation (CV)2.924524575
Kurtosis415.2547427
Mean5663.5805
Median Absolute Deviation (MAD)5922.429753
Skewness14.66836433
Sum169907415
Variance274342256.1
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000000e+00 5.000000e-01 6.500000e+00 1.750000e+01 1.635000e+02 ... 1.000740e+05 1.017880e+05 1.647010e+05 3.034075e+05 8.735520e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 5249 17.5%
 
2000 1363 4.5%
 
3000 891 3.0%
 
5000 698 2.3%
 
1500 507 1.7%
 
4000 426 1.4%
 
10000 401 1.3%
 
1000 365 1.2%
 
2500 298 1.0%
 
6000 294 1.0%
 
Other values (7933) 19508 65.0%
 
ValueCountFrequency (%) 
0 5249 17.5%
 
1 9 < 0.1%
 
2 14 < 0.1%
 
3 15 0.1%
 
4 18 0.1%
 
ValueCountFrequency (%) 
873552 1 < 0.1%
 
505000 1 < 0.1%
 
493358 1 < 0.1%
 
423903 1 < 0.1%
 
405016 1 < 0.1%
 

PAY_AMT2
Real number (ℝ≥0)

SKEWED
ZEROS
Distinct count7899
Unique (%)26.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5921.1635
Minimum0
Maximum1684259
Zeros5396
Zeros (%)18.0%
Memory size234.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1833
median2009
Q35000
95-th percentile19004.35
Maximum1684259
Range1684259
Interquartile range (IQR)4167

Descriptive statistics

Standard deviation23040.8704
Coefficient of variation (CV)3.891274139
Kurtosis1641.631911
Mean5921.1635
Median Absolute Deviation (MAD)6478.832166
Skewness30.45381745
Sum177634905
Variance530881708.9
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000000e+00 5.000000e-01 5.500000e+00 1.650000e+01 3.050000e+01 ... 1.000805e+05 1.500855e+05 2.066760e+05 4.082775e+05 1.684259e+06], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 5396 18.0%
 
2000 1290 4.3%
 
3000 857 2.9%
 
5000 717 2.4%
 
1000 594 2.0%
 
1500 521 1.7%
 
4000 410 1.4%
 
10000 318 1.1%
 
6000 283 0.9%
 
2500 251 0.8%
 
Other values (7889) 19363 64.5%
 
ValueCountFrequency (%) 
0 5396 18.0%
 
1 15 0.1%
 
2 20 0.1%
 
3 18 0.1%
 
4 11 < 0.1%
 
ValueCountFrequency (%) 
1684259 1 < 0.1%
 
1227082 1 < 0.1%
 
1215471 1 < 0.1%
 
1024516 1 < 0.1%
 
580464 1 < 0.1%
 

PAY_AMT3
Real number (ℝ≥0)

ZEROS
Distinct count7518
Unique (%)25.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5225.6815
Minimum0
Maximum896040
Zeros5968
Zeros (%)19.9%
Memory size234.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1390
median1800
Q34505
95-th percentile17589.4
Maximum896040
Range896040
Interquartile range (IQR)4115

Descriptive statistics

Standard deviation17606.96147
Coefficient of variation (CV)3.36931393
Kurtosis564.3112295
Mean5225.6815
Median Absolute Deviation (MAD)5866.072007
Skewness17.21663544
Sum156770445
Variance310005092.2
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000000e+00 5.000000e-01 1.250000e+01 3.450000e+01 1.495000e+02 ... 1.000895e+05 1.642195e+05 2.376245e+05 4.092800e+05 8.960400e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 5968 19.9%
 
2000 1285 4.3%
 
1000 1103 3.7%
 
3000 870 2.9%
 
5000 721 2.4%
 
1500 490 1.6%
 
4000 381 1.3%
 
10000 312 1.0%
 
1200 243 0.8%
 
6000 241 0.8%
 
Other values (7508) 18386 61.3%
 
ValueCountFrequency (%) 
0 5968 19.9%
 
1 13 < 0.1%
 
2 19 0.1%
 
3 14 < 0.1%
 
4 15 0.1%
 
ValueCountFrequency (%) 
896040 1 < 0.1%
 
889043 1 < 0.1%
 
508229 1 < 0.1%
 
417588 1 < 0.1%
 
400972 1 < 0.1%
 

PAY_AMT4
Real number (ℝ≥0)

ZEROS
Distinct count6937
Unique (%)23.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4826.076867
Minimum0
Maximum621000
Zeros6408
Zeros (%)21.4%
Memory size234.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1296
median1500
Q34013.25
95-th percentile16014.95
Maximum621000
Range621000
Interquartile range (IQR)3717.25

Descriptive statistics

Standard deviation15666.15974
Coefficient of variation (CV)3.246147995
Kurtosis277.3337677
Mean4826.076867
Median Absolute Deviation (MAD)5532.726692
Skewness12.90498482
Sum144782306
Variance245428561.1
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.00000e+00 5.00000e-01 6.50000e+00 1.85000e+01 9.95000e+01 ... 1.00052e+05 1.24744e+05 2.03538e+05 3.31385e+05 6.21000e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 6408 21.4%
 
1000 1394 4.6%
 
2000 1214 4.0%
 
3000 887 3.0%
 
5000 810 2.7%
 
1500 441 1.5%
 
4000 402 1.3%
 
10000 341 1.1%
 
2500 259 0.9%
 
500 258 0.9%
 
Other values (6927) 17586 58.6%
 
ValueCountFrequency (%) 
0 6408 21.4%
 
1 22 0.1%
 
2 22 0.1%
 
3 13 < 0.1%
 
4 20 0.1%
 
ValueCountFrequency (%) 
621000 1 < 0.1%
 
528897 1 < 0.1%
 
497000 1 < 0.1%
 
432130 1 < 0.1%
 
400046 1 < 0.1%
 

PAY_AMT5
Real number (ℝ≥0)

ZEROS
Distinct count6897
Unique (%)23.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4799.387633
Minimum0
Maximum426529
Zeros6703
Zeros (%)22.3%
Memory size234.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1252.5
median1500
Q34031.5
95-th percentile16000
Maximum426529
Range426529
Interquartile range (IQR)3779

Descriptive statistics

Standard deviation15278.30568
Coefficient of variation (CV)3.183386475
Kurtosis180.0639402
Mean4799.387633
Median Absolute Deviation (MAD)5482.146365
Skewness11.12741705
Sum143981629
Variance233426624.4
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000000e+00 5.000000e-01 4.500000e+00 2.350000e+01 9.950000e+01 ... 9.991100e+04 1.000720e+05 1.100710e+05 2.153995e+05 4.265290e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 6703 22.3%
 
1000 1340 4.5%
 
2000 1323 4.4%
 
3000 947 3.2%
 
5000 814 2.7%
 
1500 426 1.4%
 
4000 401 1.3%
 
10000 343 1.1%
 
500 250 0.8%
 
6000 247 0.8%
 
Other values (6887) 17206 57.4%
 
ValueCountFrequency (%) 
0 6703 22.3%
 
1 21 0.1%
 
2 13 < 0.1%
 
3 13 < 0.1%
 
4 12 < 0.1%
 
ValueCountFrequency (%) 
426529 1 < 0.1%
 
417990 1 < 0.1%
 
388071 1 < 0.1%
 
379267 1 < 0.1%
 
332000 1 < 0.1%
 

PAY_AMT6
Real number (ℝ≥0)

ZEROS
Distinct count6939
Unique (%)23.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5215.502567
Minimum0
Maximum528666
Zeros7173
Zeros (%)23.9%
Memory size234.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1117.75
median1500
Q34000
95-th percentile17343.8
Maximum528666
Range528666
Interquartile range (IQR)3882.25

Descriptive statistics

Standard deviation17777.46578
Coefficient of variation (CV)3.408581541
Kurtosis167.1614296
Mean5215.502567
Median Absolute Deviation (MAD)6199.318675
Skewness10.64072733
Sum156465077
Variance316038289.4
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0.000000e+00 5.000000e-01 4.500000e+00 1.850000e+01 9.950000e+01 ... 1.000195e+05 1.223750e+05 2.013000e+05 2.889910e+05 5.286660e+05], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 7173 23.9%
 
1000 1299 4.3%
 
2000 1295 4.3%
 
3000 914 3.0%
 
5000 808 2.7%
 
1500 439 1.5%
 
4000 411 1.4%
 
10000 356 1.2%
 
500 247 0.8%
 
6000 220 0.7%
 
Other values (6929) 16838 56.1%
 
ValueCountFrequency (%) 
0 7173 23.9%
 
1 20 0.1%
 
2 9 < 0.1%
 
3 14 < 0.1%
 
4 12 < 0.1%
 
ValueCountFrequency (%) 
528666 1 < 0.1%
 
527143 1 < 0.1%
 
443001 1 < 0.1%
 
422000 1 < 0.1%
 
403500 1 < 0.1%
 

default_next_mo
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size234.5 KiB
not default
23364
default
6636
ValueCountFrequency (%) 
not default 23364 77.9%
 
default 6636 22.1%
 

Length

Max length11
Mean length10.1152
Min length7
ValueCountFrequency (%) 
Lowercase_Letter 9 90.0%
 
Space_Separator 1 10.0%
 
ValueCountFrequency (%) 
Latin 9 90.0%
 
Common 1 10.0%
 
ValueCountFrequency (%) 
ASCII 10 100.0%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

IDLIMIT_BALSEXEDUCATIONMARRIAGEAGEPAY_0PAY_2PAY_3PAY_4PAY_5PAY_6BILL_AMT1BILL_AMT2BILL_AMT3BILL_AMT4BILL_AMT5BILL_AMT6PAY_AMT1PAY_AMT2PAY_AMT3PAY_AMT4PAY_AMT5PAY_AMT6default_next_mo
0120000.0femaleuniversity12422-1-1-2-23913310268900006890000default
12120000.0femaleuniversity226-120002268217252682327234553261010001000100002000default
2390000.0femaleuniversity234000000292391402713559143311494815549151815001000100010005000not default
3450000.0femaleuniversity137000000469904823349291283142895929547200020191200110010691000not default
4550000.0maleuniversity157-10-10008617567035835209401914619131200036681100009000689679not default
5650000.0malegraduate school2370000006440057069576081939419619200242500181565710001000800not default
67500000.0malegraduate school229000000367965412023445007542653483003473944550004000038000202391375013770not default
78100000.0femaleuniversity2230-1-100-111876380601221-159567380601058116871542not default
89140000.0femalehigh school1280020001128514096121081221111793371933290432100010001000not default
91020000.0malehigh school235-2-2-2-2-1-1000013007139120001300711220not default

Last rows

IDLIMIT_BALSEXEDUCATIONMARRIAGEAGEPAY_0PAY_2PAY_3PAY_4PAY_5PAY_6BILL_AMT1BILL_AMT2BILL_AMT3BILL_AMT4BILL_AMT5BILL_AMT6PAY_AMT1PAY_AMT2PAY_AMT3PAY_AMT4PAY_AMT5PAY_AMT6default_next_mo
2999029991140000.0maleuniversity1410000001383251371421391101382624967546121600070004228150520002000not default
2999129992210000.0maleuniversity134322222250025002500250025002500000000default
299922999310000.0malehigh school143000-2-2-28802104000000200000000not default
2999329994100000.0malegraduate school2380-1-10003042142710299670626694735500420001117844000300020002000not default
299942999580000.0maleuniversity234222222725577770879384775198260781158700035000700004000default
2999529996220000.0malehigh school1390000001889481928152083658800431237159808500200005003304750001000not default
2999629997150000.0malehigh school243-1-1-1-10016831828350289795190018373526899812900not default
299972999830000.0maleuniversity237432-1003565335627582087820582193570022000420020003100default
299982999980000.0malehigh school1411-1000-1-1645783797630452774118554894485900340911781926529641804default
299993000050000.0maleuniversity146000000479294890549764365353242815313207818001430100010001000default